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WHO  WE  ARE 

The  U.S.  Army  Corps  of  Engi¬ 
neers,  Waterways  Experiment 
Station  (CEWES)  Major  Shared 
Resource  Center  (MSRC)  is 
part  of  the  Etigh  Performance 
Computing  Modernization  Pro¬ 
gram  (HPCMP)  and  is  located 
in  the  Information  Technology 
Laboratory  at  CEWES  in  Vicks¬ 
burg,  MS.  As  a  world-class 
facility,  the  CEWES  MSRC 
employs  a  technical  staff  to 
provide  full-spectrum  compu¬ 
tational  support  for  DoD 
researchers,  from  Help  Desk 
assistance  to  one-on-one 
collaboration.  More  than  4,000 
computational  scientists  and 
engineers  are  involved  in  the 
HPCMP  with  immediate  access 
to  DoD  HPC  capabilities,  re¬ 
gardless  of  their  locations  across 
the  nation,  via  the  Defense  Re¬ 
search  and  Engineering  Network. 

Other  services  include  a  diverse 
and  well-equipped  Scientific 
Visualization  Center  for  visuali¬ 
zation  expertise  and  capability. 

In  addition,  the  Programming 
Environment  and  Training  com¬ 
ponent  provides  for  transfer  of 
cutting-edge  HPC  technology 
and  training  and  development 
activities  for  acquiring  HPC 
skills. 
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The  Front  Cover: 

This  image  shows  a  four-tined  mine  plow  configuration  moving  from  left  to  right  through  a 

soil  mass  of  10  million  particles.  Colors  on  the  plow  indi¬ 
cate  pressures  exerted  on  the  tines  from  low  (blue)  to  high 
(orange/ red).  Recent  computational  developments  have  in¬ 
creased  the  number  of  particles  that  can  be  simulated  using 
this  technology  from  a  few  hundred  thousand  to  10  mil¬ 
lion  particles.  The  very  large-scale  discrete  element  model 
provides  a  virtual  laboratory  for  evaluation  of  vehicle- 
terrain  interaction  effects  that  previously  required  expen¬ 
sive  laboratory  and  field  studies. 


CEWES  MSRC  and  Partners 
Embark  on  MetaCenter  Project 


Judith  Utley 

The  CEWES  MSRC,  in  collabora¬ 
tion  with  the  Aeronautical  Systems 
Command  (ASC)  MSRC,  has  cre¬ 
ated  an  IBM  SP-based  MetaCenter. 
This  MetaCenter  allows  users  at 
each  site  to  submit  jobs  to  their  local 
batch  queuing  system  with  the 
scheduler,  depending  on  job  require¬ 
ments  and  system  load,  scheduling 
the  job  on  the  most  appropriate  IBM 
SP  system.  The  underlying  technol¬ 
ogy  was  developed  by  NASA  and 
used  in  its  implementation  of  a 
MetaCenter  between  the  NASA 
Ames  Research  Center  and  NASA 
Langley  Research  Center.  The  NASA 
MetaCenter  existed  from  October 
1 996  until  the  IBM  SP  systems  were 
decommissioned  in  February  1998. 

The  Portable  Batch  System  (PBS) 
developed  at  NASA  Ames  and  now 
supported  by  MRJ  Technology  Solu¬ 
tions  provides  the  glue  that  bonds 
the  two  sites  and  makes  the 
MetaCenter  work.  PBS  operates  in 
networked  UNIX  environments 
and  provides  very  flexible  batch 
scheduling  for  both  batch  and  inter¬ 
active  work.  A  separate  job 
scheduler  runs  on  each  system,  mak¬ 
ing  it  easy  to  implement  and  modify 
site-specific  policies.  The  job  sched¬ 
uler,  external  to  PBS,  determines 
which  jobs  to  run  when  and  on 
which  system.  Each  system  works 
independently  so  long  as  the  jobs 
queued  and  jobs  running  allow  it  to 
keep  busy  (i.e.,  above  a  configurable 
utilization  threshold).  Once  the 
work  load  drops  below  this  prede¬ 
fined  utilization  threshold,  the  job 
scheduler  begins  peer  scheduling. 


When  a  system  needs  to  find  work 
to  increase  utilization,  the  job 
scheduler  asks  other  systems  in  the 
MetaCenter  (its  fipeersf  for  a  list  of 
their  queued  jobs  on  that  remote 
system.  It  proceeds  through  this 
job  list,  checking  each  candidate  to 
see  which  jobs,  if  any,  it  can  run 
(based  on  local  scheduling  policies). 
Once  an  eligible  job  is  found,  the 
scheduler  uses  a  PBS  movejob  request 
to  pull  the  job  from  the  peer  PBS 
server  to  the  local  server.  The  PBS 
scheduler  checks  several  job  require¬ 
ments  such  as  user-requested 
attributes  (e.g.,  a  particular  site  or 
software  package),  the  number  of 
nodes  or  the  type  of  nodes  requested, 
whether  the  time  requested  fits  poli¬ 
cies,  and  the  presence  of  an  active 
user  account  on  the  local  system.  If 
a  job  passes  all  of  these  tests,  it  is 
moved  to  the  local  system  and  mn. 
The  peer  scheduler  continues  looking 
for  jobs  until  the  utilization  increases 
above  the  predefined  threshold. 

Once  a  job  has  been  moved,  the 
peer  scheduler  initiates  any  needed 
file  staging  operations  to  move  files 
required  by  the  job  from  one  sys¬ 
tem  to  the  other.  Users  include 
file-staging  directives  in  the  job 
submission  script.  These  directives 
also  allow  users  to  tell  PBS  where 
they  would  like  to  have  their  output 
files  placed.  Otherwise,  these  out¬ 
put  files  are  returned  to  the  remote 
fipeer  from  which  the  job  origi¬ 
nated.  More  information  about 
PBS  can  be  found  on  the  Internet 
at  http: I / science.nas.nasa.gov / Software / 
PBS/. 


MetaCenter 
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MetaCenter 


The  CEWES/ASC  MetaCenter  proj¬ 
ect  will  offer  tremendous  potential 
for  the  CEWES  and  ASC  user 
communities.  With  the  work  load 
balanced  across  both  systems,  users 
can  expect  to  see  improved  turn¬ 
around  time  for  their  jobs.  The 
MetaCenter  will  also  offer  more 
flexibility  for  the  user  community. 
Users  will  be  able  to  submit  a  job  at 
their  fihome  site,  and  the  job 
scheduler  will  run  the  job  where  it 
is  most  appropriate.  Users  will  not 
have  to  keep  up  with  which  site  has 
a  particular  software  package,  nor 
will  every  site  need  to  purchase  and 
maintain  all  software  packages. 

During  operating  system  upgrades, 
scheduled  maintenance,  or  almost 
any  system  downtime,  users  may 
continue  to  submit  jobs  to  a  local 
server  and  have  their  jobs  run  auto¬ 
matically  at  the  remote  site, 
providing  much  less  interruption  to 


their  schedules.  By  coordinating 
these  outages,  everyone  profits,  us¬ 
ers  as  well  as  system  personnel,  as 
one  site  learns  from  the  upgrade  ex¬ 
periences  of  the  other.  The 
MetaCenter  staff  also  benefits  as 
the  combined  staff  of  the  two 
MSRCs  provides  a  larger  base  of  ex¬ 
pertise.  Working  together,  the  two 
sites  can  provide  a  more  dynamic 
and  user  friendly  environment. 

Once  this  first  SP  MetaCenter  proj¬ 
ect  is  in  production  and  the  benefits 
are  readily  seen,  other  sites  may 
wish  to  take  advantage  of  the  tech¬ 
nology  or  even  participate  in  the 
effort.  Additional  systems  at  each 
site  may  be  included  in  the  MetaCen¬ 
ter  as  well.  This  endeavor  should 
provide  the  foundation  for  expan¬ 
sion  to  other  similar  initiatives  at 
other  agencies  and  educational 
institutions.  H, 


Virtual  Proving 
Ground 


Mine  Plowing  Simulations  on 
the  CEWES  MSRC  HPC  Systems 


David  A.  Horner,  Ph.D. 

Alex  R.  Carrillo 

A  goal  of  the  U.S.  Army  is  to  accom¬ 
plish  Research  and  Development 
tasks  through  modeling  and  simula¬ 
tion  applications.  As  an  example, 
virtual  proving  grounds  (VPGs)  are 
being  developed  to  assist  in  the 
evaluation  of  fielding  new  vehicles 
and  vehicle-mounted  weapon  systems. 
Through  modeling  and  simulation 
of  the  vehicle  components,  VPGs 
will  be  used  to  accomplish  the 
many  tasks  required  to  field  a  new 
system  from  concept  to  prototype. 
The  U.S.  Army  Corps  of  Engineers, 
Waterways  Experiment  Station 


(CEWES)  has  initiated  research  ef¬ 
forts  to  support  the  development  of 
VPGs.  One  such  effort  involves 
one  of  the  top  priorities  of  Engi¬ 
neer  Regiments,  the  fiGrizzly  f  The 
Grizzly  is  a  breaching  vehicle  capable 
of  clearing  wire  obstacles,  anti-mine 
ditches,  log  cribs,  mbble,  and,  most 
importandy,  mines  (to  a  depth  of 
12  in.  at  up  to  3  mph).  Currendy 
mine  clearing  is  a  major  obstacle  to 
rapid  advance  forces. 

For  the  Grizzly  to  meet  its  potential, 
the  plowing  system  must  maintain 
proper  depth  control  without  inter¬ 
vention  of  the  operators.  Present 
designs  use  a  depth  control  guided 
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The  "Grizzly"  is 
a  breaching  vehicle 
capable  of  clearing  wire 
obstacles,  anti-mine 
ditches,  log  cribs, 
rubble,  and,  most 
importantly,  mines 


by  forces  acting  on  the  plow  tines 
(Figures  1  and  2).  The  deeper  the 
plow,  the  higher  the  force  on  the 
tines.  The  force-depth  relationship 
is  obtained  from  a  pre-breach  cali¬ 
bration.  However,  local  variations 
in  the  soil  may  cause  the  plow  to 
dive  too  deep,  causing  the  vehicle 
to  stall,  or  to  dive  too  shallow, 
causing  mines  to  be  missed  or  deto¬ 
nated.  Designing 
the  feedback  mecha¬ 
nism  that  adjusts 
the  plow  depth  for 
tine  force  changes 
to  be  less  sensitive  to 
small  soil  strength 
changes  requires 
field  experiments. 

Field  experiments 

are  also  required  to  develop  alterna¬ 
tive  systems  for  plow  depth  control 
and  alternative  plow  design.  Such 
experiments  are  expensive,  espe¬ 
cially  at  the  multiple  field  sites 
needed  to  verify  the  general  effec¬ 
tiveness  of  the  plow.  A  virtual 
proving  ground  will  greatly  benefit 
the  design  process  by  allowing  the 
designer  to  obtain  realistic,  simu¬ 
lated  plow-tine  performance  data 
that  supplements  limited  field 
experiments.  Computer-generated 
experimental  environments  could  be 
adjusted  to  replicate  a  variety  of 
ground  conditions  representing  di¬ 
verse  geographic  locations.  Multiple 
simulations  can  be  run  through 
identical  soil  to  isolate  effects  of 
operational  parameters.  Sensitivity 
to  variations  in  soil  conditions 
could  be  determined  by  systemati¬ 
cally  varying  the  statistical 
distribution  of  soil  strength. 

One  of  the  keys  to  a  successful 
vehicle  VPG  is  fast  and  realistic 
simulation  of  interaction  at  the 
vehicle-soil  interface.  Typically,  the 
vehicle-soil  interface  can  involve 


large  discontinuous  deformations  of 
the  soil  mass.  Soil  plowing  as  used 
in  the  clearing  of  land  mines,  tire/ 
track  sinkage,  and  the  development 
of  traction  in  loose  soils  during  off¬ 
road  movement  are  examples  of 
items  that  can  cause  the  large  soil 
deformations.  CEWES  has  been 
engaged  in  improving  large  deforma¬ 
tion  modeling  as  it  relates  to  mobility 
problems  for  the  past 
5  years.  Two  major  defi¬ 
ciencies  in  existing 
simulation  technology 
are  evident.  First,  large 
deformation  in  soils  is 
poorly  understood  and 
models  designed  for 
other  engineering  materi¬ 
als  are  not  applicable  to 
soils.  Second,  numerical  methods 
based  on  finite  difference  or  finite 
element  techniques  do  not  capture 
discontinuous  soil  deformation.  There¬ 
fore,  discrete  particle  methods  were 
adopted.  Unfortunately,  particle 


Virtual  Proving 
Ground 


Mi 


Figure  !.  Soil-Tine  Interaction.  Four-tined  mine  plow  configuration 
moving  left  to  right  through  a  soil  mass  of  10  million  particles  which 
has  been  rendered  partially  transparent.  Colors  on  the  plow  indicate 
pressures  exerted  on  the  tines  from  low  (blue)  to  high  (orange/red). 
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Virtual  Proving 
Ground 


methods  require  significant  compu¬ 
tational  resources;  thus,  the  entire 
modeling  approach  is  built  around 
the  existence  of  high  performance 
computing  resources. 

Currently,  large  distributed  memory 
parallel  computing  resources  are  be¬ 
ing  utilized  at  the  CEWES  MSRC 
to  study  the  soil-tine  interaction  of 
the  Grizzly.  Figure  1  shows  the 
soil-tine  interaction  for  a  plowing 
simulation.  The  simulation  involves 
10  million  particles  -  the  largest 
particle  simulation  being  mn  with 
non-trivial  boundary  conditions  - 
and  was  run  on  192  processors  of 
the  Cray  T3E.  Similar  runs  have 
been  performed  on  the  IBM  SP. 
Capability  tests  have  been  run  with 
as  many  as  40  million  particles. 
Available  memory  will  allow  runs 
with  up  to  1 00  million  particles,  but 
the  required  run  time  makes  this 
scale  of  simulation  prohibitive. 


The  Grizzly  has  the  potential  to 
gready  enhance  the  effectiveness  of 
rapid  response  forces  by  simplifying 
breaching  operations  and  reducing 
casualties.  Ensuring  the  Grizzly™s 
effectiveness  under  varied  field 
situations  requires  extensive  field 
experiments.  The  development  of  a 
virtual  proving  ground  will  enhance 
the  interpretation  of  those  field 
results,  greatly  supplementing  the 
experimental  database.  A  capability 
will  be  established  to  rapidly  simu¬ 
late  plow  performance  in  new 
operational  environments  prior  to 
field  deployment.  Current  work  in 
the  development  of  a  smoothed 
particle  method  has  the  potential 
to  greatly  expand  vehicle-terrain 
simulations  in  Department  of  De¬ 
fense  (DoD)  applications  by 
broadening  problem  sizes.  The 
Grizzly  mine  plow  represents  an 
opportunity  to  bring  these  methods 
to  a  developmental  state  needed  for 
widespread  use  of  particle  methods 
for  soil-vehicle  interaction. 


Figure  2.  Wheel  of  Mine 
Plow.  Pictured  is  an  idealized 
wheel,  rendered  partially 
transparent,  turning  into  a 
10  million  particle  soil  mass 
in  the  direction  of  the  arrow. 
Colors  indicate  stress  exerted 
on  individual  particles  in  the 
soil  from  blue  (low)  to  red 
(high). 
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Use  of  the  TANGO  Interactive 


Collaboratory 
CEWES  MSRC 

David  E.  Bemholdt,  Ph.D. 

Nang  J.  McCracken,  Ph.D. 

Marek  Podgomy,  Ph.D. 

Geoffrey  C.  Fox,  Ph.D. 

The  DoD  Research  and  Development 
community  is  widely  distributed 
geographically.  This  presents  quite 
a  challenge  to  providing  researchers 
with  access  to  training  on  high  per¬ 
formance  computing  systems  and 
technologies  and  to  professional 
education  in  computational  science, 
two  important  aspects  of  the 
Programming  Environment  and 
Training  (PET)  program.  Distance 
education  is  not  a  new  idea,  but  it 
often  requires  specialized  equip¬ 
ment  (such  as  satellite  uplinks  and 
specialized  videoconferencing  sys¬ 
tems)  or  does  not 
provide  the  level  of 
interactivity  many 
students  desire  (i.e., 
videotaped  lec¬ 
tures).  However,  it 
is  now  becoming 
possible  to  routinely 
deliver  courses  in  real-time  over  the 
Internet.  TANGO  Interactive,  devel¬ 
oped  at  the  Northeast  Parallel 
Architectures  Center  at  Syracuse 
University,  an  academic  partner  in 
the  CEWES  MSRC  PET  Program, 
is  a  network-based  collaborative 
tool  which  is  currently  being  used 
for  remote  education  and  training 
activities  and  will  soon  be  made 
available  to  users  of  the  CEWES 
MSRC  for  more  general  collabora¬ 
tive  use. 


Tool  in  the 
PET  Program 

TANGO  provides  a  framework  that 
allows  applications  to  be  shared  re¬ 
motely  over  the  network,  not  just 
for  education,  but  for  any  kind  of 
remote  collaboration.  The  TANGO 
system  has  a  client-server  architec¬ 
ture  in  which  the  clients  consist  of  a 
control  application  and  a  variety  of 
shared  applications.  The  control 
application  handles  administrative 
functions,  such  as  launching  applica¬ 
tions  and  tracking  which  users  are 
sharing  each  application.  Collabora¬ 
tive  applications  send  fievents  to 
the  TANGO  Server,  which  then  re¬ 
broadcasts  them  to  other  clients 
sharing  the  application.  Apart  from 
a  few  basic  functions,  the  developer 
of  the  application  is  free  to  define 
whatever  shared  events  that  are 
appropriate  to  a  particular  applica¬ 
tion.  If  communication 
performance  is  critical, 
clients  may  also  commu¬ 
nicate  directly,  bypassing 
the  server.  TANGO  also 
provides  a  basic  form  of 
fifloo  controlf  by  keep¬ 
ing  track  of  a  simple 
master/ slave  flag  for  each  instance 
of  a  shared  application  and  provid¬ 
ing  a  mechanism  to  grant  and 
relinquish  master  status.  Again,  the 
application  developer  can  interpret 
this  flag  as  appropriate  for  his  par¬ 
ticular  application  so  that  a  chat 
tool  need  not  be  forced  to  follow  a 
master/ slave  model  that  does  not 
make  sense.  At  the  same  time,  a 
whiteboard  application  can  choose 
to  use  it  to  control  fipassin  the 
penf  among  participants. 


As  far  as  TANGO  is 
concerned ,  distance 
learning  is  just  a  spe¬ 
cial  case  of  electronic 
collaboration  .  .  . 


Distance 

Learning 
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D  i  S  t  Q  n  C  6  From  the  client  side,  the  principal 

Learning  access  to  the  system  is  through  a 

web  browser,  which  allows  the  con¬ 
trol  application  and  other  Java 
applets  in  the  package  to  be  loaded 
on  demand.  Although  many  shared 
applications  are  written  in  Java,  they 
need  not  be.  Client  applications 
have  also  been  written  in  C,  C++, 
and  even  Lisp  (for  a  shared  Emacs 
editor).  The  application  program  in¬ 
terface  (API)  for  the  TANGO 
system  is  public,  and  users  are  en¬ 
couraged  to  port  existing  tools  to 
TANGO  or  develop  new  ones  as 
their  needs  require.  TANGO  was 
originally  developed  under  a  con¬ 
tract  from  the  U.S.  Air  Force  Rome 
Laboratory  for  use  in  a  Command 
and  Control/Emergency  Manage¬ 
ment  scenario,  which  included 
shared  visualization  of  terrain  data 
extracted  from  a  GIS  system,  and 


Figure  1 .  The  Architecture  of  the  Tango-Based  Distance  Education 
Project  with  Jackson  State  University.  The  lecturerTlvy.iared  browser 
uses  Tango  to  convey  URLs  for  course  materials  to  student-shared 
browsers,  which  then  retrieve  the  page  from  a  standard  web  server. 


other  groups  are  using  TANGO  as 
the  collaborative  framework  to 
share  a  variety  of  other  applications. 

As  far  as  TANGO  is  concerned,  dis¬ 
tance  learning  is  just  a  special  case 
of  electronic  collaboration,  in  which 
a  shared  web  browser  or  similar 
TANGO  application  is  used  to  show 
course  materials  served  by  a  stand¬ 
ard  web  server.  TANGO  conveys 
the  URLs  (the  shared  browser™s 
Seven  sfl  to  the  student  browsers 
each  time  the  instructor  clicks  on  a 
hyperlink,  and  the  student  browsers 
respond  by  loading  the  page  from 
the  web  server  (Figure  1).  The 
Syracuse  faculty  has  used  this  ap¬ 
proach  to  deliver  three  fully 
accredited,  semester-long  courses  in 
computational  science  to  students 
at  PET  partner  Jackson  State  Uni¬ 
versity  in  Mississippi,  with  several 
CEWES  MSRC  staff  members  audit¬ 
ing  this  semester Tlv^raduate-level 
course  at  the  CEWES  MSRC  Train¬ 
ing  and  Educational  Facility.  The 
approach  is  also  being  introduced 
into  the  PET  program  through  a  se¬ 
ries  of  prototype  distance  trainings 
in  collaboration  with  the  Ohio  Su¬ 
percomputer  Center,  another  PET 
partner. 

These  PET  initiatives  are  helping  to 
transfer  collaborative  technologies 
into  the  DoD  Research  and  Devel¬ 
opment  community  to  reduce  the 
fiimpor  ance  of  placef  in  access  to 
training,  education,  and  general 
collaboration. 
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SPLICE:  Scalable 
Programming  Library  for 
Coupling  Executables 


Brian  Jean 

Recent  trends  in  large-scale  compu¬ 
tational  mechanics  indicate  the  need 
for  a  software  module  that  will 
facilitate  communication  of  data 
between  multiple  codes  running  si¬ 
multaneously. 

Sometimes,  the  set  of  interacting 
software  programs  (referred  to  as  a 
ficod  setfl  consists  of  dissimilar 
codes  that  were  never  originally 
intended  to  function  together.  In 
addition,  the  codes  may  be  running 
on  multiple  processors  on  different 
computers.  This  need  for  coupling 
multiple  codes  can  occur  in  a 
number  of  different  application 
areas,  including  fluid  mechanics, 
environmental  quality  modeling, 
structural  mechanics,  and  climate/ 
weather/ ocean  modeling. 

Researchers  at  the  CEWES  MSRC 
are  currently  building  SPLICE,  the 
Scalable  Programming  Library  for 
Coupling  Executables.  SPLICE  will 
have  a  standard  communication 
interface  that  will  simplify  the  cou¬ 
pling  of  application  codes  by 
providing  the  researcher  with  a  set 
of  intuitive,  high-level  library  calls 
for  establishing,  maintaining,  and 
executing  external  communication 
(Figure  1). 

SPLICE  will  abstract  each  code  set 
member  from  the  internal  structure 
and  logic  of  all  other  members. 
Therefore,  once  a  code  set  has  been 
adapted  to  use  SPLICE,  only  mini¬ 
mal  changes  will  be  necessary  for 


each  code  to  function  in  other  code 
sets.  SPLICE  will  communicate  data 
via  simple  calls  to  the  library,  after 
source  and  target  members  indicate 
readiness  to  send  and  receive  data. 
If  any  member  of  a  code  set  is  run¬ 
ning  in  parallel,  SPLICE  will  store 
the  distribution  of  data  exchange 
items  for  each  member  and  update 
the  communication  links  if  a  mem¬ 
ber  repartitions  its  data. 

The  net  effect  of  SPLICE  will  be  to 
shield  the  researcher  from  many  of 
the  intricacies  associated  with  exter¬ 
nal  communication  using  current 
message-passing  libraries. 

Primary  goals  of  the  SPLICE  project 
are  to  enable  communication  be¬ 
tween  distinct  codes  mnning 
in  parallel,  possibly  on  different 
machines,  and  to  produce  a  fiuse 
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levelf  interface  that  will  hide 
communication  details  among 
codes  allowing  a  user  to  establish  a 
communication  link  with  relative 
ease  and  minimizing  the  modifica¬ 
tions  required  for  each  code. 

SPLICE  will  utilize  Message  Passing 
Interface  (MPI)  calls  for  communi¬ 
cation.  Software  such  as  NASA 
MPIRUN  or  MPI  Connect  (cur¬ 
rently  under  development),  will  be 
used  to  establish  a  communication 


protocol  between  codes  or  high 
performance  computing  (HPC) 
platforms.  SPLICE  will  be  object- 
oriented  and  written  in  C++. 
However,  Application  Programming 
Interfaces  will  be  provided  for  both 
C  and  Fortran. 

Initial  testing  of  the  software  began 
during  the  summer  1998,  with  a 
beta  release  planned  for  November 
1998. 


Turbulence 

Modeling 


Figure  1.  Volume  Visuali¬ 
zation  of  the  Temperature 
AND  VoRTICITY  MAGNITUDE. 
Instantaneous  visualization 
of  the  temperature  distribu¬ 
tions  (color)  superimposed 
on  isosurfaces  of  the  vorticity 
magnitude  (gray)  for  a  reactive 
propane/ air  jet  emerging  into 
air  background.  Flow  direc¬ 
tion  is  from  bottom  to  top. 


Scalable  Parallel  Simulation  of 
Turbulent,  Noncircular  Jets 


V  W.  Bova,  Ph.D. 

Fernando  Grin  stein,  Ph.D. 

Alan  Stagg,  Ph.D. 

The  U.S.  Navy  is  interested  in  im¬ 
proving  the  efficiency  of  jet  and 
rocket  engines  used  in  missiles  and 
other  aircraft.  Traditionally,  jet  ex¬ 
hausts  have  been  designed  using 
circular  nozzles.  Today,  rectangular 
jets  are  of  special  interest  because 
they  have  certain  characteristics  that 
may  be  exploited  to  improve  the 
combustion  process.  For  example, 
passive  combustion  control  strategies 
are  based  on  geometrical  modifica¬ 
tion  of  the  jet  nozzle  to  manipulate 
the  natural  development  of  large- 
scale  vortices  and  their  breakdown 
into  turbulence  to  enhance  entrain¬ 
ment  and  mixing.  If  the  turbulent 
mixing  in  the  flow  can  be  increased, 
then  the  reactants  can  be  mixed 
more  thoroughly,  thereby  giving 
more  complete  combustion.  Com¬ 
pared  to  circular  jets,  rectangular 
jets  offer  passively  improved  mixing 
at  both  ends:  enhanced  large-scale 
entrainment  due  to  axis-switching 


and  enhanced  small-scale  mixing 
due  to  faster  transition  to  turbu¬ 
lence.  The  code  described  below, 
NSTURB3D,  simulates  these 
processes  and  allows  researchers  to 
focus  on  understanding  the  dynam¬ 
ics  and  topology  of  the  following 
phenomena:  jet  entrainment,  mix¬ 
ing,  and  combustion,  as  well  as  their 
dependence  on  the  aspect  ratio  of 
the  jet  cross  section.  These  simula¬ 
tions  elucidate  the  operative  fluid 
dynamic  mechanisms  involved  in 
the  transition  to  turbulence  and  pro¬ 
vide  an  improved  basis  for 
conceptual  and  analytical  modeling 
of  turbulent  jet  combustion  (Fig¬ 
ures  1  and  2). 

NSTURB3D  is  a  FORTRAN  program 
that  simulates  the  turbulent  mixing 
of  a  compressible,  three-dimensional, 
space/ time  developing  rectangular 
jet  with  its  surroundings.  The 
numerical  model  is  based  on  the 
solution  of  the  time-dependent  flow 
equations.  In  order  to  effectively 
emulate  the  practical  flow  regimes, 
the  required  simulations  consume 
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large  amounts  of  computer  memory 
and  CPU  time.  Current  simulations 
being  run  on  the  CRAY  C90  in¬ 
volve  thousands  of  time-steps  with 
1 8  million  unknowns  per  time-step. 

(  )ne  wav  to  accelerate  large-scale 
simulations  such  as  these  is  to  use 
scalable,  high-performance  platforms, 
such  as  those  available  at  the 
CEWES  MSRC,  which  have  hun¬ 
dreds  of  application  processors. 
Furthermore,  these  machines  have 
at  least  an  order  of  magnitude  more 
available  memory  than  the  C90.  In 
order  to  exploit  these  features,  a 
parallel,  distributed-memory  imple¬ 
mentation  of  the  existing  serial  code 
is  necessary  and  is  therefore  being 
developed  at  CEWES.  With  such 
an  implementation,  even  larger 
simulations  can  be  performed  in  a 
timely  fashion. 

NSTURB3D  uses  an  explicit  second- 
order,  predictor-corrector  time 
integration  scheme  and  flux-corrected 
transport  (FCT)  algorithm  to 
approximate  the  compressible 
Navier-Stokes  equations  on  a 
three-dimensional  Cartesian  grid. 
Orientation  in  the  jet  is  expressed 
with  respect  to  the  streamwise  and 
cross-stream  directions.  The  FCT 
algorithm  sweeps  through  the  grid 
by  performing  a  two-dimensional 
solve  on  the  cross  planes  and  a  one¬ 
dimensional  solve  in  the  streamwise 
direction. 

The  parallel  algorithm  proceeds  as 
follows.  First,  the  grid  and  its  asso¬ 
ciated  data  structures  are  statically 
partitioned.  Since  there  is  more 
computational  work  associated  with 
the  cross  plane  solves  than  with  the 
one-dimensional  streamwise  solves, 
a  two-dimensional  decomposition 
of  the  cross  plane  is  performed. 

The  decomposition  is  further  con¬ 
strained  to  ensure  load  balance  by 


specifying  an  approximately  equal  Tui"bul©nce 

number  of  grid  cells  per  processor.  Modeling 

The  resulting  subdomains  overlap 
by  a  layer  of  fighos  cellsf  which 
must  be  exchanged  throughout  the 
solution  process  by  explicit  commu¬ 
nication.  This  communication  is 
performed  with  the  MPI  library  for 
portability.  The  use  of  Fortran  90 
modules  allows  the  details  of  the 
message-passing  implementation 
to  be  hidden  from  the  user  and 
promotes  software  reuse  and  ease 
of  maintenance.  The  parallel 
implementation  is  complete  and  is 
currently  being  verified  on  three 
CEWES  MSRC  scalable  platforms: 
the  IBM  SP,  the  CRAY  T3E,  and 
the  SGI/CRAY  Origin  2000. 

After  verification,  simulations 
which  require  the  solution  of  many 
thousands  of  systems  of  equations 
that  have  30  to  40  million  un¬ 
knowns  each  will  be  performed  on 
50  to  100  processors. 


"worm” 

vortices 

hairpin 

(braid) 

vortices 

CwH, 

vortex 

rings 

Figure  2. 

ISO-SURFACES  OF 
THE  VORTICITY 
Magnitude. 
Red  indicates 
regions  of 
high  vorticity. 
The  square  jet 
(at  the  bottom 
of  the  figure) 
is  initially 
laminar  and 
the  transition 
to  turbulence 
takes  place 
along  the  jet 
axis  towards 
the  top  of  the 
figure. 
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Panoram  System  Augments 
High-Yield  Visualization 
Capabilities 


John  E.  West 

As  the  computational  resources 
available  to  users  of  the  CEWES 
MSRC  continue  to  expand,  so  do 
the  complexity  and  size  of  prob¬ 
lems  solved.  While  very  large  data 
sets  have  become  more  common¬ 
place  in  recent  years,  techniques  for 
managing  these  data  sets  are  only 
just  now  beginning  to  evolve.  One 
problem  has  been  that  visualiza¬ 
tions  of  these  data  sets  had  to  be 
displayed  on  workstation  monitors. 
Even  large  monitors  with  1024  X 
1280  pixels  may  have  a  relatively 
low  resolution  when  compared  with 
the  amount  of  data  to  be  displayed. 


Panoram  Technologies  GVR-120  Reality  Centre.  The  GVR-120  system  is  an  arrayed 
video  projector  system. 


This  results  in  the  need  for  subsam¬ 
pling  or  complex  multi-resolution 
projection  methods;  however,  sub¬ 
sampling  may  result  in  a  substantial 
loss  of  information,  while  multi¬ 
resolution  methods  can  be  slow  or 
difficult  to  implement.  With  the 
addition  of  the  Panoram  Technolo¬ 
gies  GVR-120  Reality  Centre  to  the 
scientific  visualization  resources 
available  at  the  Information  Tech¬ 
nology  Laboratory  (ITL),  it  is  now 
possible  to  visualize  data  sets  at  a 
higher  resolution  than  previously 
possible. 

The  GVR-120  system  is  an  arrayed 
video  projector  system.  The  system 
is  driven  by  a 
Silicon  Graphics 
Inc.  (SGI) 

Power  Onyx 
with  Infinite 
Reality  graphics, 
four  R10000 
processors, 

3  Gbytes  of 
RAM  and  ap¬ 
proximately 
30  Gbytes  of 
disk  space.  The 
Power  Onyx  is 
connected  via 
ATM,  FDDI, 
and  Ethernet  to 
other  MSRC  and 
ITL  resources, 
ensuring  that 
data  can  be 
transferred 
efficiently  from 
wherever  it  was 
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generated.  Output  from  the  Power 
Onyx  is  directed  through  custom 
hardware  and  three  Electrohome 
projectors  onto  a  271.5-in.  by  68-in. 
curved  display.  This  configuration, 
called  the  MegaDesktop™,  provides 
a  3200  X  1 024  pixel  viewable  area  to 
the  user.  In  this  mode  the  system 
behaves  just  like  an  extremely  large 
monitor,  providing  the  ability  to 
size  and  move  windows  anywhere 
on  the  screen.  Software  that  runs 
on  the  SGI  is  inherently  compatible 
with  the  system  and  can  be  run 
without  any  modifications,  enabling 
use  of  both  custom  and  commercial 
scientific  visualization  software 
tools. 

The  Panoram  display  system  supports 
a  variety  of  other  display  modes  and 
input  devices  when  not  in  the 
MegaDesktop™  configuration.  In 
addition  to  the  SGI  Power  Onyx, 
the  system  is  configured  to  accept 
input  from  various  video  sources, 
personal  computers,  and  document 
display  systems.  Furthermore,  the 
sources  can  be  combined  in  a  variety 
of  configurations  using  the  library 
of  display  parameters  provided  with 
the  system.  Thus,  up  to  three  differ¬ 
ent  sources  can  be  displayed  on  the 
screen  at  once,  or  any  two  may  be 
displayed  in  a  variety  of  arrangements 
at  one  time.  The  system  can  also  be 
used  to  create  high-resolution  virtual 
environments.  The  curved  display 
fills  approximately  120  degrees  of 
the  usersHfeld  of  view  and  is 
equipped  to  produce  stereo  output 
when  used  with  StereoGraphics 
CrystalEyes™  glasses.  The  confer¬ 
ence  room  in  which  the  system  is 
located  has  a  high-fidelity  audio 
system  for  a  complete  sense  of 
immersion. 


Due  to  its  resolution  and  size,  the 
display  is  well  suited  to  conveying 
information  to  a  large  audience,  and 
the  flexibility  of  the  system  allows 
application  designers  to  create 
visualizations  for  a  high-yield 
visualization  experience.  Several 
applications  recently  created  by  the 
CEWES  MSRC  for  this  environment 
use  two  of  the  three  projectors  (or 
two-thirds  of  the  screen)  to  visualize 
data  while  a  companion  video  is  dis¬ 
played  on  the  remaining  projector. 
This  configuration  permits  the  appli¬ 
cation  scientist  or  visualization 
specialist  to  communicate  the  nature 
of  the  problem  being  addressed  in 
the  video  portion  of  the  display  area 
and  move  directly  to  visualization 
of  the  data  without  having  to  shift 
the  viewer sT¥bcus.  Furthermore, 
the  side-by-side  display  permits  di¬ 
rect  comparison  of  the  results  of 
experiments  and  simulations  while 
maintaining  high  resolution. 

The  large,  curved  display  also 
makes  the  Panoram  system  well 
suited  to  CADD/GIS  applications, 
including  architectural  walk-throughs 
and  assessment  of  engineering  de¬ 
sign  alternatives  at  the  conceptual 
stage.  The  Scientific  Visualization 
Center  staff  has  a  long  history  of 
producing  highly  effective,  profes¬ 
sional  quality  animations  of 
engineering  concepts  using  industry 
standard  packages  such  as  Maya™ 
from  Alias  |  Wavefront.  Typically 
these  models  are  produced  originally 
as  part  of  an  animation  production. 
However,  a  significant  side  benefit 
of  the  model  building  process  in 
these  packages  has  been  the  ability 
to  interactively  explore  these  models 
using  custom  applications  created 
by  the  Scientific  Visualization 
Center. 


I  nformation 
Display 
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Developing  a  FORTRAN  API 
to  the  Pthreads  Library 


Henry  A.  Gabb,  Ph.D. 

Clay  P.  Breshears,  Ph.D.. 

S.  W.  Bova,  Ph.  D. 

Pthreads  is  a  POSIX  standard 
established  to  control  the  spawning, 
execution,  and  termination  of  multiple 
tasks  within  a  single  process.  Con¬ 
current  tasks  are  assigned  to 
independent  threads.  At  all  times 
only  a  single  process  exists  within  a 
single  address  space,  although  multi¬ 
ple  processors  may  be  employed  to 
execute  the  various  threads.  Even 
on  a  single  processor  computer, 
however,  Pthreads  programs  are 
often  more  efficient  due  to  better 
resource  utilization.  For  example, 
one  thread  performs  calculations 
while  another  handles  input/output 
(I/O).  Computation  and  I/O  can 
usually  be  done  simultaneously, 
even  on  a  single  processor. 


The  Fortran  API  to  Pthreads  was  used  to  parallelize  a  high-energy 
impact  particle  dynamics  code.  Courtesy  of  U.S.  Air  Force  Research 
Laboratory. 


As  useful  as  Pthreads  is  for  parallel 
programming  on  shared-memory 
computers,  a  FORTRAN  interface  is 
not  defined  by  the  POSIX  standard. 
However,  the  technical  barriers  to 
implementing  such  an  application 
program  interface  (API)  are  not  in¬ 
surmountable.  First,  the  library  is 
small,  consisting  of  only  6 1  routines 
which  can  be  loosely  classified  into 
three  categories:  thread  creation, 
termination,  and  manipulation;  syn¬ 
chronization;  and  scheduling.  The 
latter  category  must  interact  with 
the  system  and  represents  the  low¬ 
est  level  of  the  library.  As  such, 
the  bindings  to  these  routines  are 
difficult  to  test. 

Another  technical  difficulty  is  that 
Pthreads  makes  extensive  use  of 
C  structures.  The  data  in  these  struc¬ 
tures  are  only  manipulated  during 
calls  to  the  library.  The  application 
program  rarely  needs  to  access  the 
data  directly.  So  rather  than  trying 
to  pass  FORTRAN-derived  types 
into  C  structures,  only  the  locations 
are  communicated  between  the 
application  and  the  interface.  These 
memory  addresses  are  declared 
private  to  the  interface  so  that  the 
application  cannot  inadvertently 
forget  where  the  structures  are 
located. 

The  programmer  must  also  pay 
close  attention  to  variable  scope 
since  the  rules  for  scoping  are  differ¬ 
ent  in  FORTRAN  and  C.  This  is 
particularly  important  since  each 
thread  has  its  own  stack  space  in 
addition  to  global  memory.  Another 
slight  difference  is  that  the  FORTRAN 
interfaces  with  the  Pthreads  library 
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through  subroutine  calls  instead  of 
function  calls.  For  example,  a  C 
thread  obtains  its  identification  code 

my_id  =  pthread_self(); 

whereas  a  FORTRAN  thread  gets  its 
ID  as  follows: 

call  fpthread_self(  myjd  ) 

Notice  that  the  routine  names  are 
slightly  different.  This  naming  con¬ 
vention  is  similar  to  that  of  a 
parallel  virtual  machine  (PVM). 

The  interface  consists  of  two  files. 
The  first  is  a  FORTRAN  90  module 
containing  some  necessary  con¬ 
stants  and  structure  definitions. 
These  structures  (which  are  really 
FORTRAN  90  derived  types)  are 


completely  artificial;  they  contain 
memory  addresses  instead  of  data. 
The  other  file  contains  bindings  to 
the  Pthreads  library  along  with  the 
necessary  include  files.  These 
bindings  are  written  in  the  C  pro¬ 
gramming  language.  The  API  is  a 
useful  tool  for  parallel  program¬ 
ming.  The  Fortran  API  to 
Pthreads  library  was  completed  by 
engineers  and  scientists  at  the 
CEWES  MSRC,  and  the  interface, 
including  documentation,  should  be 
available  by  November  1998.  Any¬ 
one  interested  in  obtaining  the  API 
should  contact  the  Computational 
Migration  Group  through  the 
CEWES  MSRC  Customer  Assis¬ 
tance  Center  at  1-800-500-4722  or 
http :/  /  wes.hpc.mil/ . 
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Tips  to  Improve  Start-Up  Time 
for  Parallel  Jobs  on  the  IBM  SP 


K  Phillip  Bording  Ph.D. 

The  CEWES  MSRC  team  members 
offer  the  following  suggestions  for 
improving  system  performance. 

Starting  a  Job  That  Uses 
Many  Processors 

In  order  for  a  job  to  start  on  each 
of  the  IBM  SP  compute  nodes,  a 
copy  of  the  executable  code  must 
reside  in  a  directory  of  a  disk  file 
system  and  be  memory  mappable 
(executable  codes  located  in  PIOFS- 
based  file  systems  are  not  memory 
mappable).  When  a  job  is  activated 
by  a  job  manager,  it  does  so  from 
some  disk  file  system  directory.  The 
file  system  directories  on  an  SP 
compute  node  are  either  local 
($SCRATCH)  or  remote  (SI  1(3 ME, 
$HOMEDIR,  SWORKDIR).  If  the 


user  starts  a  job  with  an  executable 
code  that  resides  in  a  remote  direc¬ 
tory,  the  remote  high  availability  file 
server  (HAFS)  must  deliver  the 
blocks  associated  with  that  code, 
directly  impacting  the  job  load  time. 
For  SP  jobs  using  just  a  few  com¬ 
pute  nodes,  this  start-up  time  is 
nominal.  However,  for  jobs  that  use 
a  large  number  of  compute  nodes 
(64  or  more),  the  time  required  to 
move  multiple  copies  of  the  ex¬ 
ecutable  code  can  be  very  long. 

One  solution  is  to  exploit  the 
compute  node™k>cal  file  system, 
$SCRATCH.  The  executable  code 
can  be  placed  in  $SCRATCH  for 
the  life  of  the  job.  The  high-speed 
remote  $WORKDIR  file  system 
(either  GPFS  or  PIOFS  based)  is 
attached  to  all  of  the  nodes  that 
comprise  the  SP  system  and  can 


Systems 
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Systems  act  as  an  intermediate  transfer 

mechanism. 

The  methodology  is  this:  perform 
a  single  remote  file  copy  operation, 
copying  the  executable  (i.e., 
fiprogra  _binaryfl  from  $HOME 
or  $HOMEDIR  to  fWORKDIR. 
Next,  perform  a  parallel  copy  opera¬ 
tion,  copying  the  program_binary 
from  SWORKDIR  to  $SCRATCH 
on  each  of  the  allocated  SP  compute 
nodes.  Execute  the  copy  of  the 
program  binary  located  in 
SSCRATCI I  on  each  node.  Finally, 
and  most  importandy,  clean  up 
$SCRATCH  so  that  successive  jobs 
will  have  adequate  disk  space. 

Experiments  at  the  CEWES  MSRC 
have  shown  that  this  method  dra¬ 
matically  reduces  the  time  to  start 
large  jobs.  Additionally,  the  variation 
timing  due  to  moving  initial  mem¬ 
ory  pages  from  $HOME  is  greatly 
reduced. 

The  following  code  illustrates  how 
to  perform  this  job  staging  to 
$SCRATCH  via  SWORKDIR: 

Main_Script: 

#!/bin/ksh 

#PBS  -I  nodes=64,walltime=03:00:00 
#PBS  -j  oe 
#PBS  -V 

#  prepare  for  parallel  copy 

cp  -p  $HOME/program_binary 
$WORKDIR/program_binary 

#  perform  parallel  tasks 
pbspoe  $HOME/Pbspoe_Script 

#  clean  up 

rm  $WORKDIR/program_binary 

Pbspoe_Script: 

#!/bin/ksh 

#  distribute  binary  to  individual  compute  nodes 
cp  -p  $WORKDIR/program_binary 

$SCRATCH/program_binary 

#  execute  binary 
$SCRATCH/program_binary 

#  clean  up 

rm  $SCRATCH/program_binary 


The  Main_Script  is  submitted  to  the 
PBS  server  via  the  f  qsubf  command. 
The  Pbspoe_Script  is  called  by  the 
Main_script  and  represents  those 
tasks  to  be  carried  out,  in  parallel, 
on  the  allocated  compute  nodes. 

Programming  the  Cache 

The  SP  nodes  have  local  cache 
memory  space  that  is  not  user  pro¬ 
grammable.  This  means  that  the 
user  or  compiler  cannot  issue 
computer  instructions  that  directly 
control  the  cache.  However,  the 
user  can  benefit  by  understanding 
how  the  cache  works. 

The  primary  idea  is  to  allow  data 
movement  which  keeps  the  maxi¬ 
mum  amount  of  data  in  the  cache 
for  reuse.  By  keeping  as  much  data 
as  possible  in  the  cache,  the  proces¬ 
sor  delays  that  are  associated  with 
memory  access  are  greatly  reduced, 
and  program  performance  can  be 
improved. 

The  SP  has  four  workload  and  store 
instructions  that  can  be  used  by  the 
code  generator  if  the  cache  is  properly 
described  to  the  FORTRAN  compiler. 

In  compiling  MPXLF  FORTRAN, 
the  following  cache  commands 
should  be  used  with  the  compiler: 

mpxlf  program.f  -o  program  -03  -qstrict 
-qarch=pwr2 

-qtune=pwr2  -qhot  -qcache=as 
-soc=4:cost=38: 

level=1  :line=256:size=1 28:type=d 

Note  that  all  of  the  data  specified 
are  essential  to  describe  the  cache. 
Also,  these  data  apply  to  the 
POWER2  SuperChip,  which  is  on 
all  the  SP  nodes  at  the  CEWES 
MSRC. 
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Recent  Performance  Study  on 
CEWES  IBM  SP  Demonstrates 
Speed  of  New  Chip 


K  Phillip  Bordingf  Ph.D. 

A  scalability  and  performance  study 
of  ENSAERO  MPI,  a  NASA  Ames 
Research  Center  software  code  used 
for  complicated  computer-generated 
visualizations,  was  demonstrated  to 
run  almost  2.5  times  faster  on  the 
newer  135  MHz  POWER2  proces¬ 
sors  in  the  IBM  SP  at  the  CEWES 
MSRC  than  on  the  previous 
66.7  MHz  processors. 


ENSAERO  MPI  is  a  parallelized, 
high-fidelity,  multi-block,  multidisci¬ 
plinary  code  with  fluids,  stmctures, 
and  controls  capabilities  developed 
at  NASA  Ames  Research  Center. 
ENSAERO  MPI  is  capable  of  com¬ 
puting  aeroelastic  responses 
by  simultaneously  integrating  the 
Navier  Stokes  equations  and  the 
finite  element  stmctural  equations 
using  aeroelastical  adaptive,  dy¬ 
namic  grids  (Figure  1.). 


fiThi  speed  up  is  due  to  the  use  of 
the  new  and  expanded  135  MHz 
POWER2  chips  used  at  the  CEWES 
MSRC  relative  to  the  66.7  MHz 
chips,f  said  Mehrdad  Farhangnia, 
MCAT  Inc.,  NASA  Ames  Research 
Center. 

fiThi  is  one  illustration  of  how 
DoD  high  performance  computing 
centers,  like  the  CEWES  MSRC, 
are  committed  to  providing  the  lat¬ 
est  in  computing  technology  to  the 
DoD  user  community, f  said  Dr.  N. 
Radhakrishnan,  CEWES  MSRC  Site 
Manager  and  Director  of  the  CEWES 
Information  Technology  Laboratory. 

fiRunn  ng  at  approximately  40  mil¬ 
lion  floating  point  operations  per 
second  (MFLOPS)  per  processor, 
this  corresponds  to  9  GFLOPS  per¬ 
formance,  the  highest  achieved  to 
date  by  ENSAERO  MPI  and  nearly 
2.5  times  faster  than  the  maximum 
achievable  performance  on  the 
66.7  MHz  processors  by  this  code,f 
Farhangnia  said. 


The  major  systems 
making  up  the  core  of 
the  CEWES  MSRC  are 
an  IBM  SP,  a  Silicon 

I 


Figure  1.  The  New 
135-Mhz  Processor  enables 
faster  turn  around  for 
researchers  studying  F-18 
aeroelastic  tail  responses. 
Photos  courtesy  of  the  U.S. 
Navy. 


Graphics  Incorporated  (SGI)  Ori¬ 
gin  2000,  a  CRAY  T3E,  a  CRAY 
C90,  and  a  100-Terabyte  mass  stor¬ 
age  archival  system.  The  center 
provides  classified  and  unclassified 
scientific  visualization  services,  on¬ 
site  computational  assistance,  and  a 
fully  staffed  user  services  department. 


The  code  is  parallelized  on  a  coarse 
grain  level  using  MPI.  The  different 
disciplines  are  solved  independently 
on  separate  nodes,  with  the  flow 


MSRC  Journal  |  Fall  1998 


15 


Performance 


domain  partitioned  further  into  a 
number  of  subdomains.  There  is 
also  multi-parameter  parallelization, 
where  various  parameter  sets  are 
run  concurrendy  for  a  particular 
configuration. 

The  model  being  studied  is  a  wing- 
body-empennage  configuration, 
which  consists  of  a  single  block 
HO  grid  with  180x  173  x40 
points  in  the  streamwise,  spanwise, 
and  body  normal  directions,  respec¬ 
tively.  The  grid  is  split  into 
multiple,  equally  sized  zones  cut  per¬ 
pendicular  to  the  streamwise 
direction,  with  each  zone  assigned 
to  a  separate  processor.  Timing 
functions  are  utilized  to  exclude  in¬ 
itialization  and  input/ output  (I/O) 
CPU  usage;  thus  only  the  solver 
portion  of  the  code  is  represented. 

The  multiple  parameter  set  plot 
showed  the  scalability  of  the  code 


relative  to  a  9-zone  case  for  steady 
flow  computations.  The  code  was 
scaled  up  to  25  parameter  sets 
(225  nodes)  on  the  CEWES  MSRC 
machine.  The  single  parameter  set 
showed  the  performance  of  split¬ 
ting  of  the  volume  grid  into  1 8  and 
36  zones.  The  difference  in  proces¬ 
sor  speed  was  evident  here  as  the 
CEWES  MSRC  machine  performed 
50  percent  faster  than  the  66.7 
MHz  processor  on  the  9-zone  case 
(Figure  2). 

fiOver  11,  the  performance  of  the 
CEWES  MSRC  SP  has  been  very 
encouraging  in  both  per  node  per¬ 
formance  and  scalability,  as  well  as 
turnaround  times.  Production  runs 
are  now  in  the  works  for  an  1 8  block, 
F-18  wing-body-empennage  case, 
where  dynamic  aeroelastic  re¬ 
sponses  of  the  tail  will  be  analyzed, f 
Farhangnia  said. 


Coarse  Grain  Parallelization 

Wing- Body-Empennage  Configuration 

1,245,600  grid  points 


hgure  2.  Scalability  and  Performance  of  ENSAERO-MPI  on  tfie  SP. 
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Adapting  AVS  to  HPC  Data 


Kent  Eschenberg ,  Ph.D. 

Many  visualization  projects  can 
benefit  from  using  commercial, 
off-the-shelf  (COTS)  software 
packages.  However,  some  data  sets, 
including  most  encountered  at  the 
CEWES  MSRC,  do  not  quite  fit 
into  the  input  file  model  supported 
by  these  packages.  The  CEWES 
MSRC  Scientific  Visualization  Center 
(SVC)  has  developed  custom  input 
modules  for  the  AVS/Express  Visu- 
alizer  package  that  convert  the  HPC 
data  to  the  model  used  by  AVS. 

The  Data  Model 

Within  the  AVS  visualization,  system 
data  are  stored  as  a  field,  a  self¬ 
describing  data  structure  and 
dimensions,  units,  labels,  coordi¬ 
nates,  and  data  values.  A  field  can 
be  stmctured  (with  various  levels  of 
coordinate  detail)  or  unstructured. 
An  unstructured  field  may  contain 
any  number  of  arrays  of  cells,  called 
cellsets,  where  each  cellset  is  a 
specific  geometry  type:  point,  line, 
triangle,  quadrilateral,  prism,  pyramid, 
tetrahedron,  and  hexagon.  One  or 
more  data  values  (for  example, 
salinity)  can  be  fiattached  to  the 
vertices,  the  cell  center,  or  both. 

The  custom  input  modules  de¬ 
scribed  below  have  a  simple  goal: 
read  a  data  file  and  produce  a  field. 
Each  module  is  designed  to  read  a 
specific  type  of  file  and  produce  a 
specific  type  of  field.  The  actual 
visualization  modules  (such  as  those 
that  create  isosurfaces  and  cut 
planes)  are  designed  to  work  with 
most  types  of  fields;  therefore,  a 
user  can  work  with  the  same  visuali¬ 


zation  options  with  many  different 
types  of  fields. 

Worki  ng  with 
Time-Varying  Data 

Data  sets  resulting  from  simulation 
on  high  performance  computers 
vary  over  time.  Early  versions  of 
AVS  did  not  support  time-varying 
data.  The  version  currently  in  use 
at  CEWES  MSRC,  Visualizer  3.4, 
contains  several  problems  with  time- 
varying  fields.  More  recent  versions 
of  the  software  may  provide  excel¬ 
lent  support  for  time-varying  data. 

Thus,  a  locally  developed  approach 
has  been  used  for  the  management 
of  time-varying  fields.  Each  input 
module  contains  three  subroutines: 
the  fireader,  which  reads  the  input 
files  and  saves  all  of  the  data  in 
memory;  the  fislicer,  which  extracts 
the  data  at  one  particular  time  and 
creates  a  field;  and  the  fireleaser, 
which  frees  the  memory  used  to 
store  the  data  when  the  module  is 
deleted. 

The  reader  is  often  enhanced  to 
provide  some  other  services.  For 
example,  some  readers  include 
fifilters  that  can  be  set  by  the  user 
to  eliminate  certain  categories  of 
data  before  they  are  even  stored  in 
memory.  This  can  be  critical  to 
working  with  some  of  the  very  large 
HPC  data  sets.  Many  visualization 
systems  (including  AVS)  implement 
such  a  filter  by  first  reading  every¬ 
thing  into  memory  and  then  making 
a  copy  of  the  parts  of  the  field  that 
were  of  specific  interest. 
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Scientific  Another  example  of  a  special  service 

Visualization  provided  by  the  reader  is  interpola¬ 

tion  to  provide  finer  time-steps  or 
to  smooth  out  irregularly  spaced 
time-steps.  While  many  visualiza¬ 
tion  packages  can  perform  this  sort 
of  function  as  they  step  through 
time,  the  approach  used  in  the  SVC 
custom  modules  is  to  perform  the 
interpolation  once,  during  the  read 
phase,  instead  of  every  time  the  in¬ 
terpolated  time  is  accessed. 


Figure  1.  Particle  flow  for  the  RFW  visualization  at  a  time-step 
soon  after  the  system  has  been  started. 


Figure  2.  The  full  three-dimensional  particle  set  from  the  RFW 
simulation  with  transparent  structures  to  improve  the  view  of 
interior  areas. 


Example  Projects 

The  Radio  Frequency  Weapon 
(RFW)  project  is  using  HPC  simula¬ 
tions  to  develop  high  power 
microwave  emitters.  The  simula¬ 
tion  tracks  the  location  of  electrons 
as  they  are  emitted  and  oscillated  in¬ 
side  the  device.  The  raw  data  files 
can  occupy  as  much  as  2  gigabytes 
of  storage  on  the  HPC  disk.  A  data 
reduction  scheme  was  developed  to 
reduce  the  size  of  this  data  set  be¬ 
fore  it  could  be  visualized.  First,  a 
custom  preprocessor  on  the  HPC 
filters  and  condenses  the  data  into 
a  special  file  format;  second,  a  cus¬ 
tom  AVS  input  module  reads  this 
file.  The  field  produced  by  this 
module  consists  of  unstmctured 
points  that  can  be  displayed  as 
pixels  or  spheres  (Figures  1  and  2). 

After  the  RFW  input  module  was 
completed,  two  other  CEWES 
MSRC  projects  emerged  that  were 
very  similar.  In  one  case,  the  salin¬ 
ity  of  the  water  in  a  bay  at  about 
500  time-steps  needed  to  be  analyzed. 
In  the  other  case,  the  probability  of 
structural  failure  as  a  function  of 
three  variables  and  time  needed  to 
be  viewed.  In  both  cases,  a  new 
custom  input  module  was  devel¬ 
oped  by  making  modest  changes  to 
the  RFW  input  module. 

Conclusion 

Using  COTS  visualization  packages 
can  greatly  reduce  the  time  needed 
to  complete  a  project  and  can  help 
improve  the  quality  of  the  results  by 
allowing  the  developers  to  work 
with  the  similar  visual  options  from 
project  to  project.  With  the  right 
approach,  writing  custom  input 
modules  can  be  an  efficient  and 
elegant  solution  with  a  large  amount 
of  module  reuse. 
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Preprints 

Preprints  are  available  from  the  CEWES  MSRC  Customer  Assistance  Center 
or  on  the  CEWES  MSRC  web  site  at  http :/ / wes.hpc.mil/ pet/ CEWES/ 
CEWES_reports.html.  For  further  information,  contact  the  Customer  Assistance 
Center  by  telephone  at  800-500-4722  or  by  e-mail  at  info-hpc@wes.ljpc.mil. 
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For  further  information,  contact: 


CEWES  MSRC 
Customer  Assistance  Center 

Web  site 
http :/  /  ms.hpc.mil 

E-mail  at  info-ljpc@ms.hpc.mil 

Telephone  at  800-500-4722 
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Vorticity  Dynamics  in  a  Square  Air  Jet  Emerging  into  Air  Background 
Improving  the  mixing  of  a  jet  (or  plume)  with  its  surroundings  is  of  considerable  inter¬ 
est  in  practical  applications  demanding  enhanced  performance  of  combustors  in  missile 
and  other  Navy  aircraft  propulsion  systems.  Geometric  modifications  to  a  jet  nozzle  can 
efficiently  provide  such  improvements  by  directly  affecting  the  formation  of  large-scale 
vortices  and  their  breakdown  into  turbulence.  The  subsonic  square  jet  in  the  figure 
evolves  from  laminar  initial  conditions  (bottom  of  the  figure).  A  ray- tracing  technique  is 
used  to  visualize  the  vorticity  magnitude,  ranging  from  semi-transparent  blue  to  opaque 
red.  The  square  jet  development  is  characterized  by  the  dynamics  of  strongly-interacting 
vortex  rings  and  hairpin  (braid)  vortices  in  the  near  jet,  and  by  elongated  fiw  rmf  vor¬ 
tices  in  the  turbulent  region  downstream.  This  simulation  involved  nearly  10  million  grid 
points.  Performing  larger  simulations  demands  distributed-memory  implementation  of  the 
existing  serial  code  to  scalable,  high-performance  platforms.  B_ 
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